Diagnosis-oriented dialogue system queries the patient's health condition and makes predictions about possible diseases through continuous interaction with the patient. A few studies use reinforcement learning (RL) to learn the optimal policy from the joint action space of symptoms and diseases. However, existing RL (or Non-RL) methods cannot achieve sufficiently good prediction accuracy, still far from its upper limit. To address the problem, we propose a decoupled automatic diagnostic framework DxFormer, which divides the diagnosis process into two steps: symptom inquiry and disease diagnosis, where the transition from symptom inquiry to disease diagnosis is explicitly determined by the stopping criteria. In DxFormer, we treat each symptom as a token, and formalize the symptom inquiry and disease diagnosis to a language generation model and a sequence classification model respectively. We use the inverted version of Transformer, i.e., the decoder-encoder structure, to learn the representation of symptoms by jointly optimizing the reinforce reward and cross entropy loss. Extensive experiments on three public real-world datasets prove that our proposed model can effectively learn doctors' clinical experience and achieve the state-of-the-art results in terms of symptom recall and diagnostic accuracy.
translated by 谷歌翻译
In recent years, interest has arisen in using machine learning to improve the efficiency of automatic medical consultation and enhance patient experience. In this article, we propose two frameworks to support automatic medical consultation, namely doctor-patient dialogue understanding and task-oriented interaction. We create a new large medical dialogue dataset with multi-level finegrained annotations and establish five independent tasks, including named entity recognition, dialogue act classification, symptom label inference, medical report generation and diagnosis-oriented dialogue policy. We report a set of benchmark results for each task, which shows the usability of the dataset and sets a baseline for future studies. Both code and data is available from https://github.com/lemuria-wchen/imcs21.
translated by 谷歌翻译
当前的预训练语言模型(PLM)通常是通过静态数据训练的,忽略了在现实情况下,各种来源的流数据可能会不断增长。这要求PLM终生整合来自所有来源的信息。尽管可以通过对所有现有数据进行详尽的预培训来实现此目标,但已知该过程在计算上是昂贵的。为此,我们提出了Elle,目的是为新兴数据有效终身预训练。具体而言,ELLE由(1)函数保留的模型扩展组成,它们灵活地扩展了现有的PLM的宽度和深度以提高知识获取的效率; (2)预先训练的领域提示,它消除了在预训练期间学习的多功能知识,并刺激了下游任务的适当知识。我们通过来自BERT和GPT上5个域的流数据进行实验。结果表明,在预训练效率和下游性能中,ELLE的优越性超过了各种终身学习基线。这些代码可在https://github.com/thunlp/elle上公开获得。
translated by 谷歌翻译
基于模型的强化学习方法在许多任务中实现了显着的样本效率,但它们的性能通常受模型错误的存在限制。为减少模型错误,以前的作品使用单一设计的网络来符合整个环境动态,将环境动态视为黑匣子。然而,这些方法缺乏考虑动态可能包含多个子动态的环境分解性,这可以单独建模,允许我们更准确地构建世界模型。在本文中,我们提出了环境动态分解(ED2),这是一种以分解方式模拟环境的新型世界模型施工框架。 ED2包含两个关键组件:子动力学发现(SD2)和动态分解预测(D2P)。 SD2发现环境中的子动力学,然后D2P构建子动力学后的分解世界模型。 ED2可以容易地与现有的MBRL算法和经验结果表明,ED2显着降低了模型误差,并提高了各种任务上最先进的MBRL算法的性能。
translated by 谷歌翻译
视觉和语言导航(VLN)是一个任务,代理在人类指令下的体现室内环境中导航。以前的作品忽略了样本难度的分布,我们认为这可能会降低他们的代理表现。为了解决这个问题,我们为VLN任务提出了一种基于课程的基于课程的培训范式,可以平衡人类的先验知识和特工关于培训样本的学习进度。我们开发课程设计原则,并重新安排基准房间到室(R2R)数据集,以使其适用于课程培训。实验表明,我们的方法是模型 - 不可知的,可以显着提高当前最先进的导航剂的性能,概括性和培训效率而不会增加模型复杂性。
translated by 谷歌翻译
In recent years, arbitrary image style transfer has attracted more and more attention. Given a pair of content and style images, a stylized one is hoped that retains the content from the former while catching style patterns from the latter. However, it is difficult to simultaneously keep well the trade-off between the content details and the style features. To stylize the image with sufficient style patterns, the content details may be damaged and sometimes the objects of images can not be distinguished clearly. For this reason, we present a new transformer-based method named STT for image style transfer and an edge loss which can enhance the content details apparently to avoid generating blurred results for excessive rendering on style features. Qualitative and quantitative experiments demonstrate that STT achieves comparable performance to state-of-the-art image style transfer methods while alleviating the content leak problem.
translated by 谷歌翻译
In recent years, the Transformer architecture has shown its superiority in the video-based person re-identification task. Inspired by video representation learning, these methods mainly focus on designing modules to extract informative spatial and temporal features. However, they are still limited in extracting local attributes and global identity information, which are critical for the person re-identification task. In this paper, we propose a novel Multi-Stage Spatial-Temporal Aggregation Transformer (MSTAT) with two novel designed proxy embedding modules to address the above issue. Specifically, MSTAT consists of three stages to encode the attribute-associated, the identity-associated, and the attribute-identity-associated information from the video clips, respectively, achieving the holistic perception of the input person. We combine the outputs of all the stages for the final identification. In practice, to save the computational cost, the Spatial-Temporal Aggregation (STA) modules are first adopted in each stage to conduct the self-attention operations along the spatial and temporal dimensions separately. We further introduce the Attribute-Aware and Identity-Aware Proxy embedding modules (AAP and IAP) to extract the informative and discriminative feature representations at different stages. All of them are realized by employing newly designed self-attention operations with specific meanings. Moreover, temporal patch shuffling is also introduced to further improve the robustness of the model. Extensive experimental results demonstrate the effectiveness of the proposed modules in extracting the informative and discriminative information from the videos, and illustrate the MSTAT can achieve state-of-the-art accuracies on various standard benchmarks.
translated by 谷歌翻译
Machine learning models are typically evaluated by computing similarity with reference annotations and trained by maximizing similarity with such. Especially in the bio-medical domain, annotations are subjective and suffer from low inter- and intra-rater reliability. Since annotations only reflect the annotation entity's interpretation of the real world, this can lead to sub-optimal predictions even though the model achieves high similarity scores. Here, the theoretical concept of Peak Ground Truth (PGT) is introduced. PGT marks the point beyond which an increase in similarity with the reference annotation stops translating to better Real World Model Performance (RWMP). Additionally, a quantitative technique to approximate PGT by computing inter- and intra-rater reliability is proposed. Finally, three categories of PGT-aware strategies to evaluate and improve model performance are reviewed.
translated by 谷歌翻译
We propose a novel approach to self-supervised learning of point cloud representations by differentiable neural rendering. Motivated by the fact that informative point cloud features should be able to encode rich geometry and appearance cues and render realistic images, we train a point-cloud encoder within a devised point-based neural renderer by comparing the rendered images with real images on massive RGB-D data. The learned point-cloud encoder can be easily integrated into various downstream tasks, including not only high-level tasks like 3D detection and segmentation, but low-level tasks like 3D reconstruction and image synthesis. Extensive experiments on various tasks demonstrate the superiority of our approach compared to existing pre-training methods.
translated by 谷歌翻译
Collaboration among industrial Internet of Things (IoT) devices and edge networks is essential to support computation-intensive deep neural network (DNN) inference services which require low delay and high accuracy. Sampling rate adaption which dynamically configures the sampling rates of industrial IoT devices according to network conditions, is the key in minimizing the service delay. In this paper, we investigate the collaborative DNN inference problem in industrial IoT networks. To capture the channel variation and task arrival randomness, we formulate the problem as a constrained Markov decision process (CMDP). Specifically, sampling rate adaption, inference task offloading and edge computing resource allocation are jointly considered to minimize the average service delay while guaranteeing the long-term accuracy requirements of different inference services. Since CMDP cannot be directly solved by general reinforcement learning (RL) algorithms due to the intractable long-term constraints, we first transform the CMDP into an MDP by leveraging the Lyapunov optimization technique. Then, a deep RL-based algorithm is proposed to solve the MDP. To expedite the training process, an optimization subroutine is embedded in the proposed algorithm to directly obtain the optimal edge computing resource allocation. Extensive simulation results are provided to demonstrate that the proposed RL-based algorithm can significantly reduce the average service delay while preserving long-term inference accuracy with a high probability.
translated by 谷歌翻译